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Abstract 

I examine the consequences of modelling contagious influence in a social network 
with incomplete edge information, namely in the situation where each individual may 
name a limited number of friends, so that extra outbound ties are censored. In particu- 
lar, I consider a prototypical time series configuration where a property of the "ego" is 
affected in a causal fashion by the properties of their "alters" at a previous time point, 
both in the total number of alters as well as the deviation from a central value. This is 
considered with three potential methods for naming one's friends: a strict upper limit 
on the number of declarations, a flexible limit, and an instruction where a person names 
a prespecified fraction of their friends. I find that one of two effects is present in the 
estimation of these effects: either that the size of the effect is inflated in magnitude, 
or that the estimators instead are centered about zero rather than related to the true 
effect. The degree of heterogeneity in friend count is one of the major factors into 
whether such an analysis can be salvaged by post-hoc adjustments. 

In any design of a social network study, there are choices to be made about the declaration 
of friendships between individuals so that the putative network can be studied. This often 
comes in the form of survey information, where respondents are asked to list all their close 
friends or acquaintances, or measured from behavioural observation on how the individuals 
interact. In the case of studies of network contagion, in which an attribute of one person 
is potentially passed to one or more friends (an effect also known as "induction"), there is 
considerable interest in establishing a set of "strong" friendships that may form the backbone 
of the social network, through which one would potentially observe a contagious effect, such 



as the adoption of a technology Aral et al. , 2009 , medical service or preference Valente 
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Figure 1: The censoring process at work. Left: An uncensored 100-node binary network, 
with mean degree 4 and mild heterogeneity on degree. Right: The same network when each 
individual can name no more than two friends. One of the consequences of this censoring is 
the isolation of more nodes from the giant component. 
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While there are many issues regarding the perception of network ties that can distort 
putative models of contagion (notably the thresholding/dichotomization problem raised by 
Thomas and Blitzstein |2010b| ), here I refer to a particular circumstance: the only friendships 



are binary in nature (as assumed by most social network models), and the respondent or "ego" 
is limited to naming a fixed number of their friends, or "alters" in the study in question (which 
I label the "name-fc-friends" limitation), resulting in the omission of a number of true ties - 
often the vast majority of ties. 

As a result, any changes in ego behaviour between time points that were due to these 
"invisible" connections may now be attributed to properties of the self (either in terms 
of personal characteristics or past values of the quantity of interest) or to properties of 
their visible network neighbours. A direct contagion effect across an unobserved edge would 
instead be detected in at least one of, if not all three of, a change in the autocorrelation (self) 
parameter, the effect of other visible ties, or the effect of higher-degree neighbours in the 
system - if the network is highly transitive (the friend of your friend is also your friend), then 
a person who is a true friend may appear instead as a second- or third-degree acquaintance, 



and models and approaches that purport to show a higher-degree pattern may instead be 
picking up features that are instead consequences of lower-degree actions. 

As another consequence, as seen in Figure [TJ nodes that were originally in the giant 
component can be disconnected if no one was able to name them, if they themselves named 
no one in the system. These individuals may be unwittingly driving network behaviour if 
contagious influence is present in the system, especially if their low popularity is connected 
to the trait that is undergoing contagious influence. 

This treatment focuses solely on mechanisms that operate at one degree only - that true 
friends of friends have no direct influence. Additionally, there are no "false friends" ; anyone 
who is named as a friend was truly a friend to begin with. For cases when a mechanism that 
censors out-degree is used, I show that there are two main consequences to this action when 
it comes to the estimates of coefficients in linear models: 

• There may be an inflation effect in the estimate of the coefficient size, so that the 
impact of any particular edgeon an individual's outcome is measured to be considerably 
higher than the true effect; this effect may be proportional to the fraction of ties that 
are observed, but can also be affected by other properties in the construction of the 
network. 

• There may be a complete disruption of the estimate of the coefficient size; rather 
than have an estimate that is proportional to the true underlying value, the estimate 
is instead due to the random (and independent) variation in the sampling mechanism, 
and is then centered about zero, with a variance that is an increasing function of the 
absolute true effect size. 

Whichever effect might be observed in the analysis is a product of the naming mechanism as 
well as the true distribution of outgoing ties across all individuals. 

I begin with examples of social network studies where out-degree censoring has been 
known to take place by design. I then simulate a series of binary social networks where a 
trait is observed on the individuals at two time steps, and make the second time step depend 
on both autoregressive and network-based processes. Each network naming model is then run 
with a series of censoring rules, and a linear model on the evolution of ego traits is run under 
each condition. The bias and coverage probabilities for estimators of the autocorrelated and 
network terms from each censored network are then compared to the "oracle" truth. I show 
that without adjusting for the censoring mechanism appropriately, the inferences made for 
each network effect are altered in unusual ways that will compromise many investigations. 



1 Background: "Name Your Best Friend" As A Net- 
work Generator 



The National Longitudinal Study of Adolescent Health (better known as Add Health Bear- 



man "eTalj [1997]) is a study that, among other goals, aims to put health outcomes into 
network contexts by starting with cohorts of students within secondary schools. In order to 
collect information on friendship networks, participants were shown a school roster and asked 
to name (up to) their five best male friends and five best female friends, for a maximum of ten 
total friends. Romantic and sexual networks were also measured similarly, asking for "three 



romantic and three non-romantic sexual partners", for a maximum of six partners Morris 



2004 



For whatever reasons these constraints were put in place - to filter out less relevant 
friendships and more distant sexual relationships, for example - they were made in the design 
stage, well in advance of a major collection effort, and hence are beyond the ability of later 
investigators to expand without major modelling assumptions. Notably, 25% of students 
name five friends of the same gender; 25% name five friends of the opposite gender, and 10% 



name ten total friends Jackson, 20091. 



The Framingham Heart Study (FHS) has progressed for decades as a means of tracking 
longitudinal medical data on a large community, but the potential of its social network 



information has only recently been explored. A social network was constructed in Christ akis 



and Fowler 2007 after the investigators noted that a "close friend" was listed on the tracking 



information for study participants in case of lost contact, and that a great number of these 
friends were also in the FHS. Hence, in addition to family members, a friendship network 
was constructed consisting of thousands of participants. 

As the designers of the heart study did not plan the social network component of future 
research, they did not anticipate the issue that would be caused by requesting only a single 
close friend for contact (though a small number of individuals listed multiple contacts re- 
gardless of the instruction). The consequences of this censoring of out-degree for friendships 
have not been directly addressed in the original FHS network paper and follow-up works 



Christakis and Fowler, 2008 Fowler and Christakis, 2008 ; in particular, whether estima- 



tions related to the relative distances between individuals would be affected by the addition of 
censored ties. One of the marquee claims of this research has been a universal "three degrees 



of influence" rule Christakis and Fowler, 2009 , but if these claims rest on a network where 



a third-degree acquaintance is in reality a first-degree friend, there is considerable reason to 
doubt its universality. 



2 Simulation Models 



I demonstrate the impact of network structure, both real and censored, on the evolution of 
a trait in a networked system. There are a large number of possibilities for models of this 
form. I Using the notation for observables: 

• Y 0i and Y u represent the observed trait on ego % at times t = {0, 1} respectively. Y is 
the mean outcome over all individuals at time 0. 

• Wij is a directional network tie, which in most applications is whether % considers j to 
be a friend and is therefore a conduit for the spread of a trait. Di = ^ . is the 
outdegree for individual i, and D is the mean outdegree for all individuals. 

There are a number of possible mechanisms for temporal influence on the trait in question. 

• Autocorrelation, such that the previous time point's trait will influence the present. 
The simplest form of this would be 

Y u = n + 'jYoi + e h 

so that the effect is moderated by the previous time value. The effect can also be with 
respect to a different central point c, so that instead the model is Yu = //+7(>oj — c)+6j, 
where fi — // + 7c. 

• Peer influence, wherein a trait of an individual j affects that of individual % at a later 
time, so long as i named j as a friend (Wij = 1). The equation would then be 

Y u = fi + f3j2WijY 0j +€i, 
j 

though the effect will be more difficult to identify. If there is a different pivot point 
d, then the form is Y u = fx' + /3J2jWij(Y j — d) + e-, so that // = /i + (3D and 
e'j = ti + (5(Di — D). This, at least, has a diagnosis for the problem if the total peer 
contagion vector WY is correlated with the trait vector Y , since this correlation can 
be detected after fitting. 

Instead of the same pivot point for each individual, it may be that the influence is 
proportional to the difference in value between the alter and ego; this is essentially 
a "drive toward homophily" so that connected individuals move their values closer to 
each other. If this is the case, then the equation is 



Y u = n + (3 WijiYoj - Y 0i ) + e h 
j 

and takes additional adjustment: namely, if the model fit is Yu = fi' + (3 ^ . W^Y^j + e[, 
then the adjustment becomes // = fx + /3DY and e- = q + /3(DiY 0i — DY ). 

• Peer count/outdegree effect, so that the more friends a person has, the more their 
trait will change in an interval of time. In the most basic form, 

Y 1{ = fi + 5D { + €i. 

Again, the effect can also be with respect to a different central point g, so that instead 
the model is Yu =(/,' + 5(Di — g) + where /i = // + 5g. 

These effects can become connected, once more than one of these terms is introduced 
simultaneously; as a consequence, if the specification of one term is changed, it can lead to 
terms other than the intercept ji to be biased. Consider the case with both peer effects in 
place, 

Yu = fi + Pj2 W » ( Y °i ~ rf ) + 5D * + €i ' 

j 

for the sake of demonstration, we identify equilibrium points on the peer influence term only. 
Suppose the true pivot on the contagion term was d but the model chosen by the investigator 
chose a pivot of zero. Then the true generative equation would take the form 

Y u = fi + P J2 W ^o ~ P( D * - d ) + + ei, 
j 

affecting the estimate of fi (as /i — /3d) and 5 (as 5-/3). Note that omitting the Di term from 
this equation would correspond to fixing 5 = (3 in the equation, likely causing estimation 
error on (3 if the two are different and there is any observed correlation between the terms in 
WY and Y . 

2.1 Modelling the Process On The Network 

For the purpose of this investigation, I use the general evolution model of the form 



Y u = n + 7 (Y W - Y ) + (3 Wij (Y oj -Y )+5 (A - D) + e u . 

j 

If this model faithfully represents the actual mechanism at work on the network, then 
the consequences of censoring network ties will be apparent in the biases of estimating the 
intercept /i, the autocorrelation 7, the network contagion /3 and the outdegree coefficient 
5. Essentially, the censoring mechanism takes the true sociomatrix W and produces a new 
sociomatrix X, the condition being that the declared outdegree of each ego, is 
bounded above at a constant value. 

Since the linear model framework represents the geometry of the covariate space, I consider 
two factors: how the generative mechanism of the true network relates to the prior trait value 
Yq, and how the censoring mechanism relates to the prior trait value given the existence of 
the true network. 



2.2 A Network Simulation Model 

Because the geometric relationship of edge selection to trait value is primarily of interest, I 
construct our network according to the following formula: 

• Determine the prior trait value Yq for all nodes in the system. For the sake of this 
simulation, these will be independent draws from a standard N(0, 1) distribution. 

• Generate a term for the gregariousness of an ego, that is naturally heterogeneous, 
so that some egos seek more friendships than others. For this simulation, these will be 
independent draws from a normal distribution with pre-selected variance <?jj^\ 

• Generate a term for homophily h with respect to the trait of interest: if two individuals 
have similar values of the prior trait, they will be more likely to be connected if h > 0, 
less likely if h < (heterophily), and indifferent if h = (isophily). 

• Choose coefficients r in and r out for the raw covariances between the pairs (Y j,W{j) 
and (Y Qi ,Wij) - the dependence on the initial trait value and the inbound/outbound 
propensitis for the existence of an edge respectively. Choose the values to respect the 
upper bound rf n + r 2 out < 1. 



This follows the notation of Holland and Leinhardt 1981 and elaborated on in Thomas and Blitzstein 



2010a . 

2 T~do not induce an additional term for the differential "popularity" of an individual, or their tendency to 



attract friendships. 



• Define and set a baseline continuous edge value, 



Zij = N(o£i + r in Y 0j + r out Y 0i - h\Y 0i - Y oj \,l - (r in + r out )), 

that will be the measure against which a binary edge is created. Create a binary 
network by selecting a threshold value u, and define Wij = I{Zij > oj); for the sake of 
this demonstration, choose u to fix the density of arcs in the underlying graph. Ensure 
that there are no "self-edges" in the system by setting Wa = for all i. 



This model is a primitive version of the generative mechanism proposed in Thomas and 



Blitzstein 2010a , but is still sufficient to explore the geometry of the network with respect 
to the prior trait value. 

For this study, 50,000 networks were simulated with 100 or 200 nodes with varying values 
of each parameter. I standardize this so that each network has a mean outdegree of 10, 
though similar results exist for networks of different mean degree. In recognizing that many 
of these terms may or may not exist in a simulated network, any given parameter parameter 
may be set to zero in half the simulations so as to induce a variety of possible networks and 
evolutions. 

The method of censoring outdegree is just as important to the analysis as the network 
construction and contagion models. I consider several different prototypical mechanisms for 
network censoring, the consequences of each I explore in detail. 

Given that W is the true friendship matrix, let be the friendship matrix retrieved due 
to the censoring/naming process. Just as the construction of the whole uncensored network 
can depend on the value of the trait, the naming process (conditional on these ties) can also 
depend on the absolute value of the traits of the contacts, the difference in value between 
contacts (for heterophilous or homophilous naming), or it can be completely independent of 
them. 

The apparent equation to estimate will now be of the form 



due to the lack of information on the entire network. The consequences of each naming 
mechanism will then alter the geometry of D i} WY and Y respectively; it is the consequence 
of these changes in geometry that I detail in the next several sections. 



3 Mechanism 1: Hard Upper Limits — the Standard 
"Name k Friends") 

The total number of friendships is strictly limited to be no more than k, and each person 
is required to name as many friends as they believe they have up to that limit. If each 
participant has at least k friends, there will be an immediate difficulty: the "number of 
friends" term, 5 — reduces the covariates to a zero vector; this makes 

the effect of the number of friends non-identifiable. 

The main burden of identification will be on two sources: the ungregarious (people with 
fewer than k named friends) , and rule-breakers (people who do name more than the appropri- 
ate number of friends.) In the case of "name one friend" (or k = 1), this may lead to a vastly 
disproportionate degree of influence, since there are likely to be far fewer loners and rule- 
breakers. If the resulting mean out-degree is close to one, the few zeroes and rule-breakers 
will carry much of the weight in the regression, and will hypothetically add considerable bias 
estimates on the estimates of the effect of "friend count" . 

Figure [2] shows the estimates for each of the model parameters for this naming scheme, as 
divided by the heterogeneity on outdegree in the uncensored case. There are several features 
that are immediately evident for each of the parameters; the outdegree effect S is summarized 



in Section 3.1 What is most obvious in the other plots is that higher heterogeneity makes 
the estimate for the peer contagion effect have higher variance, but reduces the variance 
on the autocorrelation 7. 

3.1 Low Heterogeneity on Degree Breaks Estimations in Strict 
Naming Schemes 

There appear to be three significant families of results that appear in cases when the level of 
heterogeneity is varied, as seen in Figure [3] 

1. Everyone has named a single friend, and thus the effect of relative outdegree is uniden- 
tifiable. 

2. One or two people have named no friends, and therefore hold most of the power relative 
to the group; their personal outcomes then drive the estimate of S and tend negative, 
though are still distributed about zero. 



3. Several people have named no friends, and the effect is better balanced between individ- 
uals. The perceived effect size is then an "inflation" of the true effect size. This inflation 
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Figure 2: Scatterplots for the estimated values of /i, /3, 7 and 5 across all simulations for the 
hard upper limit of "name one friend" . Colors refer to the degree of heterogeneity present in 
the system (the pink vertical refers to nonidentifiable 6 values.) The red lines are along the 
diagonal where the estimated values would equal the true generative values; the green line 
represents the (expected) "inflation" of the effect size of 5 by a factor of 10. 



is typically close to the ratio of total friends to named friends; this ratio increases as 
the number of censored friendships increases. 

The conditions that create each of these situations will vary with the generative parame- 
ters. Figure [3] shows how these conditions vary with two parameters: the initial heterogene- 
ity in outdegree between individuals, and the degree to which an individual's gregariousness 
varies with their own value of Y . For the sake of visualization, I remove those simulations 
where the true value of 7 is zero (as the sum of trials at zero is identical to those not equal 
to zero, and the results are similar without the benefit of a spread on the y-axis). 

When the average degree is far higher than the censoring level, and heterogeneity is quite 
low, there will be very few if any people who have named zero friends, meaning that the effect 
is either unidentifiable or highly inflated. In particular, those cases in which an effect was 
identified, but meaningless, correspond to cases with low initial heterogeneity and a positive 
correlation between gregariousness and their prior Y . This suggests that those individuals 
with zero friends will also have low values of Yq (affecting the estimate of 7) and those with 
at least one friend will tend to find those with high Y (affecting the estimate of f3). 

The top-left panel of Figure [3] shows this in terms of colors: models with estimated 
(7 < 0, (3 < 0, in blue, are leading to artificially higher values of 5, even when there is no true 
effect of network size on the outcome value. Models with estimated (7 < 0, /3 > 0, in green, 
appear in the majority where the true S is negative, but the estimated value is positive. 

It is worth noting that these effects virtually disappear in the case where heterogeneity 
is high; the estimates of 7 return to being nearly exclusively positive (as seen in the bottom 
right panel of Figure [3]) as the estimates of S revert to the "inflation" mode. 

3.2 Heterogeneity Distorts The Contagion Effect /3, and Homophily 
on The Transmitted Attribute Biases the Autocorrelation 7 

Homophily on the observed attribute - the notion that two individuals are more likely to 
have a connection between them if they have similar values of the attribute in question - 
is a factor believed to contribute to a great deal of confounding in social network studies. 
For this particular section, I consider how a network with homophilous (or heterophilous) 
characteristics will affect linear model inferences under the action of a strict name-one-friend 
scheme. 

Since heterogeneity has already been shown to be the most devastating effect for the 
network friend count effect 5, there is little point in assessing the impact of homophily on 
that area. Instead I examine its impact on the contagion (3 and autocorrelation 7, moderating 
it with the impact of heterogeneity. Figure [4] shows the coverage properties of the estimators 



Figure 3: Consequences of the strict name-one-friend procedure on the estimate of 5, the 
friend-count effect, under various generative conditions. Blue and green points are cases 
when the estimate for the autocorrelation effect, 7, is negative (which it never truly is); 
red and black points are cases when it is positive. Pink points represent cases when 5 is 
unidentifiable. Green lines represent a slope of 10, which is the "expected" inflation for 
the removal of 9 of 10 friendship ties. Leftmost: zero additional heterogeneity with positive 
selection on the trait of interest creates the "cone" effect: any potentially identifiable 5 effects 
appear to come from a distribution centered at zero with standard deviation proportional 
to the true effect. Second from the left: once heterogeneity increases (0 < cr Q < 1), the 
inflated effect begins to appear, with many more unidentifiable scenarios. Second from 
the right: with zero or negative correlation between in-degree and minimal heterogeneity, 
the "cone" has disappeared, so that only the zeroes and inflation are present. Rightmost: 
as heterogeneity increases, the number of zero-friend individuals increases, leaving only the 
inflationary case; however, the expected inflation factor begins to rise. 
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Figure 4: Coverage properties of estimators for the contagion and autocorrelation 7. The 
solid black line is the baseline from the standard £95 distribution; other solid lines are for 
high- heterogeneity cases, dashed for low-heterogeneity. Blue/cyan lines represent minimal 
heterophily and homophily respectively; red and pink represent strong heterophily and ho- 
mophily; green lines are for isophily on the attribute. Numbers in the legend represent 
measured coverage probabilities for an intended 95% confidence interval. For the contagion 
(3, measured for positive true values, there appears to be a minimal effect of homophily on the 
coverage properties. If there is low heterogeneity on popularity, the estimates appear to be bi- 
ased downwards, but closer inspection of Figure [2] shows that this is in fact a disruption effect, 
with the interval centered about zero. For the autocorrelation effect, increased homophily 
or heterophily and lower heterogeneity cause coverage to be biased; high heterogeneity and 
zero homophily is the one case to have coverage as advertised. 



in terms of a density plot of the t-statistic, (/3 — Ptrue)/^ or (7 — ■ytrue)/l^', an estimator 
with the correct coverage probability should line up with the theoretical distribution of the 
statistic. 

Heterogeneity appears to be the driving force for the estimation of the contagion 0. There 
does not appear to be an effect of homophily or heterophily on the coverage probability, 
with the exception of high heterophily with low heterogeneity; this is likely related to the 
outdegree-dependence issue. 

Both heterogeneity and homophily /heterophily are associated with the coverage prop- 
erties of the autocorrelation term 7. Coverage increases with additional heterogeneity, the 
opposite result. As well, additional dependence on the differences between prior characteris- 
tics decreases coverage, whether or not that dependence is positive or negative; this suggests 
that these friendship selections add collinearity to the autocorrelation term by establishing 
friendships that are similar to, or wildly different from, a person's own characteristics, and 
that the censoring mechanism obscures these friendship impacts by making them appear 
similar to the autocorrelation term. 



4 Mechanism 2: Flexible Limits — "Name About k 
Friends" 

For each individual in the study, the maximum number of declared friendships is a random 
variable with expected value k; for the sake of exposition, this will be a Poisson random vari- 
able with parameter k, though other mechanisms are possible and produce similar results^} 
This will induce heterogeneity in the friendship count for each person, but this heterogeneity 
will be driven by the naming mechanism rather than the true friendship count unless there 
are large numbers of low-friend individuals. This section also assumes that there is no rela- 
tionship between the number of friends a person names and any other properties the person 
may have. 

It is important to recognize the difference between the intentional application of this 
naming scheme (which may prove to be more difficult) and its accidental application - when 
responders violate their instructions to name (up to) a fixed number of friends. The apparent 
positive benefit to the researcher, in the form of additional variability on the dimension of 
interest, may be an illusion caused by the random variation of the naming. 

3 A binomial scheme was also investigated, and produced virtually identical plots and results. 
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Figure 5: The measured effect of 5 as a function of heterogeneity under the flexible naming 
scheme. Left: with zero heterogeneity, the measured effect is centered about zero, with a 
mild dependence in the standard deviation as a function of the true value. As heterogeneity 
increases (middle, right), the zero-centered cone is replaced by a linear trend that grows past 
the one-to-one line, with the standard deviation increasing slowly with the absolute value of 
the true effect. 

4.1 Balancing Between Naming Mechanism Noise and Inflation 
Effect in the Outdegree Effect S 

Figure [5] shows the effect of increasing heterogeneity on the effect estimate for 8. While there 



were three modes of effect in the strict naming case of Section |3.1[ the unidentifiability has 
been removed, leaving only a centered cone and a linear trend as the typical patterns of the 
measured effect against the truth. 

In this case, low heterogeneity is wholly associated with the cone pattern - the measured 
effect of 5 is centered around zero, rather than the true generative value, and the width of 
the effect size increases with the absolute value of the true delta. This is consistent with 
the notion that the generative process adds signal to the system, but the naming process 
essentially randomizes its direction. 

As heterogeneity increases, the number of individuals with zero or one true friends in- 
creases, and the naming process will now more closely reflect the friendship counts of reality, 
but still distorting the signal to a degree. Notably, the inflation factor is considerably less 
than ten. 
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Figure 6: Coverage properties of estimators for the network deviation term (5 and autocor- 
relation term 7 under an approximate naming scheme. Nothing substantive has changed for 
the coverage probabilities of each type from the strict maximum scenario show in Figure |4j 

4.2 Heterogeneity (Again) Distorts Estimates of The Contagion 
(3; Homophily (Again) Affects The Autocorrelation 7 

The conclusions of the previous section have not changed with the implementation of this 
new naming scheme. Figure [6] shows the coverage properties of the estimators for j3 and 7 



and shows the same patterns that were shown in Section 3.2 



5 Mechanism 3: Fractional Limits — "Name A Propor- 
tion of Your Friends", Averaging to k Per Respon- 
dent 

Since incomplete naming in many network contexts is difficult or expensive, the temptation 
to resort to a sampling technique is inevitable. Since varying levels of heterogeneity on 
outdegree have been shown to cause difficulties in cases where the naming bound is the same 
on everyone, it is worth investigating whether a proportional sampling method would be 



a better choice. First, the respondent is asked for the total number of friends they have^J 
second, they are asked to name a certain fraction of those friends by name for the study under 
consideration. Such a method would theoretically preserve the relative gregariousness of the 
respondent in the total response, and hypothetically preserve the scale of the friendship count 
effect when incorporated into a model; if for each individual i the relationship is approximated 
as 



/ 3 / l^ j A ij \ 3 

then the relative effect of friendship count would be preserved in the average friend ratio 
" " Let the estimate for the friendship count effect using the partial network be 5; an 




(k) 



immediate adjustment to obtain a "deflated" estimate could then be 6* = 5 _? = , essen- 

J w i:i ' 

tially dividing the inflated estimate by the fraction of friendships maintained in the sampling 
method. 

Indeed, sub-network counts are being used in situations where the total network is diffi- 



cult, if not impossible, to estimate Zheng et al. , 2006 McCormick et al. 2010 , and such a 



sampling method may be sufficient to preserve the total effect of outbound friendships. 

I will show that accounting for this level of heterogeneity in the sampling scheme does 
make several things clearer - for example, if heterogeneity is the dominant factor in determin- 
ing network structure, then correcting for the mean effect size when estimating 5 is possible 
(though coverage probabilities will be overestimated) - but that it complicates other aspects 
of the analysis. 



5.1 As Heterogeneity Rises, Mean Estimates of The Outdegree 
Coefficient 5 Can Be Adjusted 

Figure [7] shows the impact of various levels of heterogeneity on the estimates of delta. At 
the low end of heterogeneity, there is little distinction between the outdegrees of each of the 
individuals in the network, and the "friend count" effect is comparatively smaller to begin 
with; the rounding caused by the fractional censoring mechanism is large compared with the 
differences between individuals, so that the loss in resolution will diminish the inflation effect, 
though not eliminate it. 



4 The definition of "friend" may prove to be different between people unless a standard methodology is 
applied in the questionnaire; see Zheng et al. 2006 ; McCormick et al. 2010 for examples on estimating 
network size. 



Figure 7: Estimates of 5 against true generative values under the fractional naming scheme. 
Black points have no generated homophily or heterophily on the attribute in question; pink 
points have homophily on the outcome attribute; cyan points have heterophily on the out- 
come attribute. In the left panel, there is no heterogeneity on the outdegree of individu- 
als, and inflation increases when moving from isophily to homophily to heterophily. Mid- 
dle, as a little heterogeneity is introduced, inflation increases though the distinction in ho- 
mophily/heterophily decreases. Right, with much more heterogeneity, the effect has nearly 
reached the expected inflation point (here, a factor of 10) and the distinction between ho- 
mophily, heterophily and isophily has disappeared. 

As the heterogeneity increases, there is far more distinction between individuals on friend 
count than before, and the effect becomes a greater contributor to the total variance in 
the outcome. The "rounding error" decreases, and the situation more closely approaches 
the complete inflation effect: for preserving one-tenth of each person's friend count, the 
measured effect increases tenfold. All together, this would suggest that the effect would be 
preserved only in cases where the possible magnitude of the signal was strong to begin with, 
unencumbered by the stochastic variation in the sampling mechanism and where relative 
ratios of friend counts can be preserved in the operation. 

5.2 Systematic Adjustments to Counter Excess Measured Conta- 
gion (3 

Figure [8] shows the impact of various levels of heterogeneity on the estimates of the contagion 
effect p. In the case where there is no heterogeneity but significant homophily, the estimates of 
/3 are inflated; as heterogeneity increases, the degree of inflation decreases. The isophilic cases 
also decrease in magnitude as heterogeneity increases, to the point of deflation; heterophilic 
cases tend to be a compromise between the homophilic and isophilic cases. In all profiles, 
the standard deviation of the estimates are proportional to the value of the contagion effect 
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Figure 8: Estimates of (3 against true generative values under the fractional naming scheme. 
Black points are isophilic on the attribute in question; pink points have homophily on the 
outcome attribute; cyan points have heterophily on the outcome attribute. In the left panel, 
there is no heterogeneity on the outdegree of individuals, and inflation increases when mov- 
ing from isophily to homophily to heterophily. Middle, as a little heterogeneity is intro- 
duced, inflation increases though the distinction in homophily /heterophily decreases. Right, 
with much more heterogeneity, much of the distinction between homophily, heterophily and 
isophily has disappeared. 

used in generating the network. 

As opposed to the outdegree effect 5, these results do not immediately give rise to a cor- 
rective prescription for estimating the contagion effect /3. The case of minimal heterogeneity 
yield a range of transformed estimates, largely inflationary in nature; as heterogeneity in- 
creases, the effect estimates are reduced in magnitude towards the truth. Homophily clearly 
produces an extra inflationary effect on the size of the contagion. This distinction diminishes 
as heterogeneity increases, yet is still present. Any prescription for correction would likely 
involve the estimation of the uncensored network and its propensities for homophilic selec- 
tion; heterogeneity in this case may be directly estimable with accurate representations of 
the outdegree of each individual. 

6 Conclusions 

This investigation into the consequences of censoring of social network ties is focused on 
single steps: one step away in the social network, one step in time. Network processes that 
exist on a greater scale, both in space and time, will be affected at each scale by the omission 
of ties. The extreme cases under example, a tenfold reduction in the number of named ties, 
are given to demonstrate the phenomena that may result in the analysis of the system under 



a linear model. 

The sampling schemes proposed fall under two categories: a constant maximum outdegree, 
or a measured outdegree proportional to each individual's total. The former is more likely 
to have naturally occurred by many naming schemes, including the studies mentioned in 
Section [TJ the latter may prove to be a workable solution as it may naturally preserve much 
of the underlying geometry, but is largely a hypothetical implementation at this point and is 
presented for demonstration purposes as much as a proposed method of compromise between 
large sampling costs and losses of information. 

6.1 Consequences on Existing Studies 

As existing studies have varying levels of censoring on their outdegree, it remains to be seen 
how the censoring of outdegree will work in each case. The Add Health study appears to 
have a minimal impact at this level, with at least 75% of respondents having named fewer 
friends in a category than the upper limit (assuming that the naming mechanism did not 
affect the naming of friends below the limit) and with the likelihood that those friends that 
reach the limit would not go far beyond it if the option were given. 

The Framingham Social Network, on the other hand, has considerable questions left to be 
answered about its naming structure, since the true distribution of friends cannot be easily 
assessed, especially since almost every respondent was shown to name at least one friend 
(even if said friend was not also in the Framingham study.) The real hope for recovery, in 
this case, is in the violators who named in excess of one friend - at least one respondent 
named six people at one time - though the extent of these violations is unknown. Given that 
the degree of censoring is likely to respected by the vast majority of respondents, the ability 
to reconstruct their hypothetical outdegree, let alone the complete network, may prove to 
introduce more error to the estimations than simply leaving them be. 

This analysis used two differently specified network effects: first, the notion that simply 
having more friends will create an effect, and second, that a friend who is above the average 
level of the characteristic will have an influence on raising that characteristic. This is one 
interpretation of two dimensions of network effects, and many studies may have other inter- 
pretations of these dimensions. The binary traits under investigation are typically pursued 
only through a single indicator, whereas the inclusion of a separate friend count variable may 
prove to be a useful inclusion to separate its impact from the overall balance effect; whether 
or not the effect can be shown to be significant from zero, it may prove to reduce confounding. 
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